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Effici ent a nd effectiv e metasearch for a lar g e number of text databases 
Clement Yu, Weiyi Meng, King-Lup Liu, Wensheng Wu, Naphtali Rishe 
November 1999 Proceedings of the eighth international conference on Information 

and knowledge management 
Publisher: ACM Press 
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Full text available:a pdf(1.Q4MB) 



Metasearch engines can be used to facilitate ordinary users for retrieving information from 
multiple local sources (text databases). In a metasearch engine, the contents of each local 
database is represented by a representative. Each user query is evaluated against the set 
of representatives of all databases in order to determine the appropriate databases to 
search. When the number of databases is very large, say in the order of tens of thousands 
or more, then a traditional metasearch engin ... 

22 Evaluating database selection techniques: a testbed and experiment 
James C. French, Allison L. Powell, Charles L. Viles, Travis Emmitt, Kevin J. Prey 
August 1998 Proceedings of the 21st annual international ACM SIGIR conference on 
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February 1998 The VLDB Journal — The International Journal on Very Large Data 

Bases, volume 7 Issue 1 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdf(265.48 KB) Additional Information: full citation , abstract , citings, index terms 

A path-method is used as a mechanism in object-oriented databases (OODBs) to retrieve 
or to update information relevant to one class that is not stored with that class but with 
some other class. A path-method is a method which traverses from one class through a 
chain of connections between classes and accesses information at another class. However, 
it is a difficult task for a casual user or even an application programmer to write path- 
methods to facilitate queries. This is because it mig ... 

Keywords: Access relevance, Access weight, OODB queries, Object-oriented databases. 
Path-method, Traversal algorithms 
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24 Enhancing relevance feedback in image retrieval using unlabeled data 
Zhi-Hua Zhou, Ke-Jia Chen, Hong-Bin Dai 

April 2006 ACM Transactions on Information Systems (TOIS), volume 24 issue 2 
Publisher: ACM Press 

Full text available: ^ pdf(1.23 MB) Additional Information: full citation , abstract , references , index terms 

Relevance feedback is an effective scheme bridging the gap between high-level sennantics 
and low-level features in content-based image retrieval (CBIR). In contrast to previous 
methods which rely on labeled images provided by the user, this article attempts to 
enhance the performance of relevance feedback by exploiting unlabeled images existing in 
the database. Concretely, this article integrates the merits of semisupervised learning and 
active learning into the relevance feedback process. In det ... 

Keywords: Relevance feedback, active learning, content-based image retrieval machine 
learning, learning with unlabeled data, semisupervised learning 
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26 Ap pication models: Clie nt-syste m col l aboration for legal corpus selection in an online Q 

^ production environment 

^ Jack G. Conrad, Joanne R. S. Claussen 

June 2003 Proceedings of the 9th international conference on Artificial intelligence 
and law 

Publisher: ACM Press 

Full text available: ^ pdf ( 239.10 KB) Additional Information: full citation , abstract , references 

The continued growth of very large data environments such as Westlaw and Dialog, in 
addition to the World Wide Web, increases the importance of effective and efficient 
database selection and searching. Current research focuses largely on completely 
autonomous and automatic selection, searching, and results merging in distributed 
environments. This fully automatic approach has significant deficiencies, including reliance 
upon thresholds below which databases with relevant documents are not search ... 

Keywords: database selection, query categorization, user interaction 
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April 2006 Proceedings of the 2006 ACM symposium on Applied computing SAC '06 
Publisher: ACM Press 
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Relevance feedback is the state-of-the-art approach for adjusting query results to the 
needs of the users. This work extends the existing framework of image retrieval with 
relevance feedback on the Web by incorporating text and image content into the search 
and feedback process. Some of the most powerful relevance feedback methods are 
implemented and tested on a fully automated Web retrieval system with more than 
250,000 logo and trademark images. This evaluation demonstrates that term re-weight 

Keywords: image retrieval, relevance feedback, world wide web 



29 Querying and web: Efficient query processing in geographic web search engines 
Yen-Yu Chen, Torsten Suel, Alexander Markowetz 

June 2006 Proceedings of the 2006 ACM SIGMOD international conference on 
Management of data SIGMOD '06 

Publisher: ACM Press 

Full text available: ^ pdf(296.76 KB) Additional Information: full citation , abstract , references 

Geographic web search engines allow users to constrain and order search results in an 
intuitive nnanner by focusing a query on a particular geographic region. Geographic search 
technology, also called local search, has recently received significant interest from major 
search engine companies. Academic research in this area has focused primarily on 
techniques for extracting geographic knowledge from the web. In this paper, we study the 
problem of efficient query processing in scalable geogr ... 

30 Technical sessio n 1: co nten t- based image retrieval: A novel log-based r elevance 
^ feedback technique in content-based image retrieval 

^ Chu-Hong Hoi, Michael R. Lyu 

October 2004 Proceedings of the 12th annual ACM international conference on 

Multimedia 
Publisher: ACM Press 

Full text available: ^ p df(228.6 2 KB) Additional Information: full citation, abstract, r e f er e nces , i ndex term s 

Relevance feedback has been proposed as an important technique to boost the retrieval 
performance in content-based image retrieval (CBIR). However, since there exists a 
semantic gap between low-level features and high-level semantic concepts in CBIR, typical 
relevance feedback techniques need to perform a lot of rounds of feedback for achieving 
satisfactory results. These procedures are time-consuming and may make the users bored 
in the retrieval tasks. For a long-term study purpose in CBIR, ... 

Keywords: content-based image retrieval, relevance feedback, support vector machines, 
users logs 



Comparin g the performance of collection selection al gorithnns 
Allison L. Powell, James C. French 

October 2003 ACM Transactions on Information Systems (TOIS), volume 21 issue 4 
Publisher: ACM Press 

Full text available: ' @pdf(668.40 KB) Additional Information: full citation , abstract, references , citings, index 
^ terms 

The proliferation of online information resources increases the importance of effective and 
efficient information retrieval in a multicollection environment. Multicollection searching is 
cast in three parts: collection selection (also referred to as database selection), query 
processing and results merging. In this work, we focus our attention on the evaluation of 
the first step, collection selection. In this article, we present a detailed discussion of the 
methodology that we used to evaluate an ... 

Keywords: Collection selection, database selection, distributed information retrieval, 
distributed text retrieval, metasearch engine, resource discovery, resource ranking, 
resource selection, server ranking, server selection, text retrieval 




http://portal.acm.org/results.cfm?query=degree%20of%20relevance%20^^ 9/21/06 



Results (page 2): degree of relevance and database and query 



Page 4 of 6 



32 On the measurement of inter-linker consistency and retrieval effectiveness in 
hy pertext databases 

David Ellis, Jonathan Furner-Hlnes, Peter Willett 

August 1994 Proceedings of the 17th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: Springer-Verlag New York, Inc. 

Full text' available: 1?| pdf(1.04 MB) Additional Information: full citation, references , citings, index terms , 
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33 L yberWorld — a visua lization user interface su p portin g fulltext retrieval 
Matthias Hemmje, Clennens Kunkel, Alexander Willett 

August 1994 Proceedings of the 17th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdf(1.31 MB) Additional Information: full citation , references , citings , index terms 



34 Que r ying and web: To search or to crawl?: towards a que r y optimizer for text-centric Q 



Panagiotls G. Ipeirotis, Eugene Agichtein, Pranay Jain, Luis Gravano 

June 2006 Proceedings of the 2006 ACM SIGMOD international conference on 

Management of data SIGMOD '06 
Publisher: ACM Press 

Full text available: ^ pdf(625.16 KB) Additional Information: full citation , abstract , r eferences , index terms 

Text is ubiquitous and, not surprisingly, many important applications rely on textual data 
for a variety of tasks. As a notable example, information extraction applications derive 
structured relations from unstructured text; as another example, focused crawlers explore 
the web to locate pages about specific topics. Execution plans for text-centric tasks follow 
two general paradigms for processing a text database: either we can scan, or 'crawl," the 
text database or, alternatively, we can exploit ... 

Keywords: focused crawling, information extraction, metasearching, query optimization, 
research, text databases 




35 Query enhancement by user profiles 
Robert R. Korfhage 

July 1984 Proceedings of the 7th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: British Computer Society 

Full text available: g pdf(51 9. 1 8 KB ) Additional Information: full citation , abstract , r efer en c es, citin gs 

We describe a theoretical model and an on-going series of experiments aimed at a priori 
query enhancement. The model presents a synthesis of concepts from retrospective and 
current awareness retrieval systems, employing the user profile as a factor in interpreting 
a query. It is expected that this will provide a more personalized response to queries, 

3^ Building effective queries in natural language information retrieval 
Tomek StrzalkowskI, Fang Lin, Jose Perez-Carballo, Jin Wang 
March 1997 Proceedings of the fifth conference on Applied natural language 

processing 
Publisher: Morgan Kaufmann Publishers Inc. 

Full text available: ^.Bdf(771.03 KB),. Additional Information: full citation , abstract , references , citings 



http://portal.acm.org/resultsxto?query=degree%20oP/o20relevance%20and%20database%20aiid%20q 9/21/06 



Results (page 2): degree of relevance and database and query 



Page 5 of 6 



Publisher Site 

In this paper we report on our natural language infornnation retrieval (NLIR) project as 
related to the recently concluded 5th Text Retrieval Conference (TREC-5). The nnain thrust 
of this project is to use natural language processing techniques to enhance the 
effectiveness of full-text document retrieval. One of our goals was to demonstrate that 
robust if relatively shallow NLP can help to derive a better representation of text 
documents for statistical search. Recently, we have turned our attenti ... 

A highly scalable and effective method for metasearch 
Weiyi Meng, Zonghuan Wu, Clement Yu, Zhuogang Li 

July 2001 ACM Transactions on Information Systems (TOIS), volume i9 issue 3 
Publisher: ACM Press 

Full text available: pi Ddf(653.63 KB) Additional Information: full citation , abstract, references, cjtings. index 
^^"^ terms 

A metasearch engine is a system that supports unified access to multiple local search 
engines. Database selection is one of the main challenges in building a large-scale 
metasearch engine. The problem is to efficiently and accurately determine a small number 
of potentially useful local search engines to invoke for each user query. In order to enable 
accurate selection, metadata that reflect the contents of each search engine need to be 
collected and used. This article proposes a highly scalable ... 

Keywords: Database selection, distributed text retrieval, metasearch engine, resource 
discovery 



38 The SIFT information dissennination system 
Tak W. Yan, Hector Garcia-Molina 

December 1999 ACM Transactions on Database Systems (TODS), volume 24 issue 4 
Publisher: ACM Press 

Full text available* W\ Ddf(220 77 KB) Additional Information: full citation , abstract, references , citings, index 
• T^jji^^a --^ terms 

Infornnation dissennination is a powerful mechanism for finding Information in wide-area 
environments. An information dissemination server accepts long-term user queries, 
collects new documents from information sources, matches the documents against the 
queries, and continuously updates the users with relevant information. This paper is a 
retrospective of the Stanford Information Filtering Service (SIFT), a system that as of April 
1996 was processing over 40,000 worldwide subscriptions and ov ... 

Keywords: Boolean queries, dissemination, filtering, indexing, vector space queries 



39 Indexing music and Chinese text: Lookin g for new, not known nnusic only: music 
^ retrieval by melody style 
^ Fang-Fei Kuo, Man-Kwan Shan 

June 2004 Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries 

Publisher: ACM Press 

Full text available: ^ pdf(526.55 KB) Additional Infornnation: full citation , abstract , references , index terms 

With the growth of digital music, content-based music retrieval (CBMR) has attracted 
increasingly attention. For most CBMR systems, the task is to return music objects similar 
to query in syntactic properties such as pitch and interval contour sequence. These 
approaches provide users the capability to look for music that has been heard. However, 
sometimes, listeners are looking, not for music they have been known, but for music that 
is new to them. Moreover, people sometimes want to retrieve mus ... 

Keywords: content-based music retrieval, music classification, music style mining, query 
by melody style 
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Dynamic q uery interpretation in relational database s 
^ A. D'Atri, P. Di Felice, M. Moscarini 

June 1987 Proceedings of the sixth ACM SIGACT-SIGI^OD-SIGART symposium on 

Principles of database systems 
Publisher; ACM Press 

Full text available: pdf(693 . 86 KB ) Additional Information: full citation, abstract , refe re nces , i ndex terms 

A new dynannic approach to the problem of deternnining the correct interpretation of a 
logically independent query to a relational database is described. The proposed 
disambiguating process is based on a simple user-system dialogue that consists in a 
sequence of decisions about the relevance (or not) of an attribute with respect to the user 
interpretation 
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^ Some Comments On E Q S , A N ea r Te r m Natura l La ngu a ge D a t a Base Que r y Sys t em 
William A. Martin 

December 1978 Proceedings of the 1978 annual conference 
Publisher: ACM Press 

Full text available- ISl Ddf(665 01 KB) ^^^^'^'^^^l Information: full citation , ab stra ct, r eferences , ci tin gs, index 
terms 

Problems and possibilities for near term natural language query systems are discussed, 
with emphasis on the author's own system, EQS. First, the general objectives for near 
term systems in the areas of syntax, world knowledge, discourse, and problem solving are 
considered. Next, a comparison is made between the ATN parsing strategies in LADDER, 
ROBOT, PLANES, and EQS. Evidence for the importance of giving answers to queries not 
directly available in the data base is given together with some ... 

Keywords: Data base query. Natural language. Semantic data models. Semantic 
networks 
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In this paper we examine the connection between two areas of semantics, namely the 
semantics of historical databases and the semantics of natural language querying, and link 
them together via a common view of the semantics of time. Since the target application 
domain is an historical database, we present the essential features of the Historical 
Relational Database Model (HRDM), an extension to the relational model motivated by the 
desire to incorporate more "real world" semantics into a database ... 
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The use of logic in identifying and analyzing inconsistency in requirements from multiple 
stakeholders has been found to be effective in a number of studies. Nonmonotonic logic is 
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a theoretically well-founded formalism that is especially suited for supporting the evolution 
of requirements. However, direct use of logic for expressing requirements and discussing 
them with stakeholders poses serious usability problems, since in most cases stakeholders 
cannot be expected to be fluent with formal log ... 
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We introduce a new approach to representing and manipulating various types of non- 
singular concepts in natural language discourse. The representation we describe is based 
on a partially ordered structure of levels in which the objects of the same relative 
singularity are assigned to the same level. Our choice of the representation has been 
motivated by the following main concerns: 1. The representation should systematically 
distinguish between those language terms that are used to refer to objec ... 
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Spatial relations often are desired answers that a geographic information system (GIS) 
should generate in response to a user's query. Current GIS's provide only rudimentary 
support for processing and interpreting natural-language-like spatial relations, because 
their models and representations are primarily quantitative, while natural-language spatial 
relations are usually dominated by qualitative properties. Studies of the use of spatial 
relations in natural language showed that topology ... 
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We have developed an approach to natural language processing in which the natural 
language processor is viewed as a knowledge-based systenn whose knowledge is about the 
meanings of the utterances of its language. The approach is oriented around the phrase 



http://portal.acm.org/resultsxfm?coll=ACM&dl=ACM&CFro=494414&CFTOKE^^^ 



9/21/06 



Results (page 1): natural and language and term and relavance 



Page 3 of 6 



rather than the word as the basic unit. We believe that this paradigm for language 
processing not only extends the capabilities of other natural language systems, but 
handles those tasks that previous systems could perform in a more systematic a ... 
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This paper describes the design of a direct manipulation user interface for Boolean 
information retrieval. Intended to overcome the difficulties of manipulating explicit Boolean 
queries as well as the "black box" drawbacks of so-called natural language query systems, 
the interface presents a two-dimensional graphical representation of a user's natural 
language query which not only exposes heuristic query transformations performed by the 
system, but also supports' query reformulat ... 



^ 0 I R-7 (information re trieval): natural lan g ua g e processin g for IR: Distributiona l t erm 
^ representations: an experi m ental compariso n 
^ Alberto Lavelli, Fabrizio Sebastiani, Roberto Zanoli 

November 2004 Proceedings of the thirteenth ACM international conference on 
Information and knowledge management CIKM '04 

Publisher: ACM Press 

Full text available: ^ pdfd 85.23 KB) Additional Information: full citation , abstract , references , index terms 

, A number of content management tasks, including term categorization, term clustering, 
and automated thesaurus generation, view natural language <i>terms</i> (e.g. words, 
noun phrases) as first-class objects, i.e. as objects endowed with an internal 
representation which makes them suitable for explicit manipulation by the corresponding 
algorithms. The information retrieval (IR) literature has traditionally used an extensional 
(aka <i>distributional</i>) representation for terms ... 
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One barrier to the acceptance of natural language database query systems is the 
substantial installation effort required for each new database. Much of this effort involves 
the encoding of semantic knowledge for the domain of discourse, necessary to correctly 
interpret and respond to natural language questions. For such systems to be practical, 
techniques must be developed to Increase their portability to new domains. This paper 
discusses several issues involving the portability ... 
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We developed a prototype information retrieval system which uses advanced natural 
language processing techniques to enhance the effectiveness of traditional key-word based 
document retrieval. The backbone of our system is a statistical retrieval engine which 
performs automated indexing of documents, then search and ranking in response to user 
queries. This core architecture is augmented with advanced natural language processing 
tools which are both robust and efficient. In early experiments, the ... 
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Missing information, imprecision, inconsistency, vagueness, uncertainty, and ignorance 
abound in information systems. Such imperfection is a fact of life in database systems. 
Although these problems are widely studied in relational database systems, this is not the 
case in conceptual query systems. And yet, concept-based query languages have been 
proposed and some are already commercial products. It is therefore imperative to study 
these problems in concept-based query languages, with a view to ... 
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For intelligent interactive systems to communicate with humans in a natural manner, they 
must have knowledge about the system users. This paper explores the role of user 
modeling in such systems. It begins with a characterization of what a user model is and 
how it can be used. The types of information that a user model may be required to keep 
about a user are then identified and discussed. User models themselves can vary greatly 
depending on the requirements of the situation and the imple ... 
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We report on the joint GE/NYU natural language information retrieval project as related to 
the Tipster Phase 2 research conducted initially at NYU and subsequently at GE R&D 
Center and NYU. The evaluation results discussed here were obtained in connection with 
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the 3rd and 4th Text Retrieval Conferences (TREC-3 and TREC-4). The main thrust of this 
project is to use natural language processing techniques to enhance the effectiveness of 
full-text docunrient retrieval. During the course of the four TR ... 
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Recent experinnents in programming natural language question-answering systems are 
reviewed to summarize tlie methods tliat liave been developed for syntactic, semantic, 
and logical analysis of English strings. It is concluded that at least minimally effective 
techniques have been devised for answering questions from natural language subsets in 
small scale experimental systems and that a useful paradigm has evolved to guide 
research efforts in the field. Current approaches to semantic analysis ... 
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This study reports the results of using minimum description length (MDL) analysis to 
model unsupervised learning of the morphological segmentation of European languages, 
using corpora ranging in size from 5,000 words to 500,000 words. We develop a set of 
heuristics that rapidly develop a probabilistic morphological grammar, and use MDL as our 
primary tool to determine whether the modifications proposed by the heuristics will be 
adopted or not. The resulting grammar matches well the analysis that ... 
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In a human dialogue it is usually considered inappropriate if one conversant monopolizes 
the conversation. Similarly it can be inappropriate for a natural language database 
interface to respond with a lengthy list of data. A non-enumerative "summary" response is 
less verbose and often avoids misleading the user where an extensional response might.In 
this paper we investigate the problem of generating such discourse-oriented concise 
responses. We present details of the design and implementation o ... 
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^ ABSTRACT 



Similarity-based retrieval of images is an important task in many image database applications. A 
major class of users* requests requires retrieving those images in the database that are spatially 
Similar to the query image. We propose an algorithm for computing the spatial similarity between two 
symbolic images. A symbolic image is a logical representation of the original image where the image 
objects are uniquely labeled with symbolic names. Spatial relationships in a symbolic image are 
represented as edges in a weighted graph referred to as spatial-orientation graph. Spatial similarity is 
then quantified in terms of the number of, as well as the extent to which, the edges of the spatial- 
orientation graph of the database image conform to the corresponding edges of the spatial- 
orientation graph of the query image. The proposed algorithm is robust in the sense that it can deal 
with translation, scale, and rotational variances in images. The algorithm has quadratic time 
complexity in terms of the total number of objects in both the database and query images. We also 
introduce the idea of quantifying a system's retrieval quality by having an expert specify the expected 
rank ordering with respect to each query for a set of test queries. This enables us to assess the 
quality of algorithms comprehensively for retrieval in image databases. The characteristics of the 
proposed algorithm are compared with those of the previously available algorithms using a testbed of 
images. The comparison demonstrated that our algorithm is not only more efficient but also provides 
a rank ordering of images that consistently matches with the expert's expected rank ordering. 
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^ ABSTRACT 

We developed a prototype information retrieval system which uses advanced natural language 
processing techniques to enhance the effectiveness of traditional key-word based document retrieval. 
The backbone of our system is a statistical retrieval engine which performs automated indexing of 
documents, then search and ranking in response to user queries. This core architecture is augmented 
with advanced natural language processing tools which are both robust and efficient. In early 
experiments, the augmented system has displayed capabilities that appear to make it superior to the 
purely statistical base. 



4^ REFERENCES 

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has 
opted to expose the complete List rather than only correct and linked references. 

1 Kenneth Ward Chur ch , Pa t rick Hanks, Word as so ciation norm s, m u tu al information, and 
lexicography, Computational Linguistics, v. 16 n.l, p. 22-29, March 1990 

2 W. Bruce Croft , Howard R. Turtle , David D. Lewis, The use of phrases and structured queries in 
info r mation retrieva l. Proc e e d in gs o f the 1 4t h annual in t ernational AC M SIGIR conference_Q_n 
Research and development in information retrieval, p. 32-45, October 13-16, 1991, Chicaoo, Illinois, 
United State s 

3 C. J. Crouch, A cluster-based approach to thesaurus construction. Proce eding s of the 11th annual 
International ACM SIGIR conference on Research and development in infor mation retrieval, p.309- 
320, May 1988 , Grenoble , Fr ance 

4 Fagan, Joel L. 1987 Experiments In Automated Phrase Indexing for Document Retrieval: A 



http ://portaLacm.org/citation.cfin?coll=GUIDE&dl=GUIDE&id=98 1 98 1 



9/21/06 



Information retrieval using robust natural language processing 



Page 2 of 4 



Comparison of Syntactic and Non-Syntactic (Methods. Pli.D. Tliesis, Department of Computer Science, 
Cornell University. 

5 Ralph Grishman , Lynette Hir schman , Ngo Thanh Nhan, Discovery procedures for subian c uaq e 
selectional patterns: initial experiments, Computational Linguistics, v. 12 n.3, p.205-215, July- 
Se pt e mber 1 9 86 

6 Grishman, Ralph and Tomek Strzalkowski. 1991. "Information Retrieval and Natural Language 
Processing." Position paper at the workshop on Future Directions in Natural Language Processing In 
Information Retrieval, Chicago. 

7 D. Harman, Towa rds interactive query expa nsion, Proceedings of the 11th annual international 
ACM SIGIR conference on Research and development in information r etrieval, p. 321-331 , May 1988, 
Grenoble, France 

8 Harman, Donna and Gerald Candela. 1989. "Retrieving Records from a Gigabyte of text on a 
Minicomputer Using Statistical Ranking." Journal of the American Society for Information Science, 41 
(8), pp. 581-589. 

9 Harris, Zelig S. 1991. A Theory of language and Information. A Mathematical Approach. 
Cladendon Press. Oxford. 

10 Harris, Zelig S. 1982. A Grammar of English on Mathematical Principles. Wiley. 

11 Harris, Zelig S. 1968. Mathematical Structures of Language. Wiley. 

12 Donald Hindle, Noun classification from predicate-argument structures, Proceed ing s of the 28th 
annual meeting on Association for Computational Linguistics, p. 268-275, June 06-09, 1990, 
Pi ttsbur g h , Pennsylvan ia 

1 3 D. D. Lewis , W. B. Croft, Term clustering of syntactic phrases, Proceedings of the 13th annual 
international ACM SIGIR conference on Research and development in information retrieval, p.385- 
404, September 05-07, 19 90 , Brussels , Belg ium 

14 Michael L. Mauldin, Retrieval performance In Ferret a conceptual information retrieval system. 
Proceeding s of t he 14th annual international ACM SIGIR conference on Research and development in 
information retrieval, p. 347-355, October 13-16, 1991, Chicago, Illinois, United States 

15 Naomi Sager, Natural Language Information Processing: A Computer Grammmar of English and 
Its A p plica tions, Addison-Wesley Longman Publishing Co., Inc. , Boston, MA, 1981 

1 6 Gerard Sa lton, Automatic text pr ocess i ng : the transformation, analysis, and retrieval of 
information by computer, Addison-Weslev Lonoman Publishing Co., Inc., Boston, MA, 1989 

17 Shannon, C. E. 1948. "A mathematical theory of communication." Bell System Technical Journal, 
vol. 27, July-October. 

18 A. F. Smeaton , C. J. van Rijsber gen, E xperiments on incorporati ng sy ntactic processing of user 
queries into a document retrieval strategy. Proceedings of the 11th annual international ACM SIGIR 
conference on Research and development in information retrieval, p. 31-51, May 1988 , Grenoble , 
France 

19 Sparck Jones, Karen. 1972. "Statistical interpretation of term specificity and its application in 
retrieval." Journal of Documentation, 28(1), pp. 11—20. 

20 Sparck Jones, K. and E. O. Barber. 1971. "What makes automatic keyword classification 
effective?" Journal of the American Society for Information Science, May-June, pp. 166—175. 



http://portal.acm.org/citation.cfin?coll=GUIDE&dl=GUIDE&id=98 1 98 1 



9/21/06 



Information retrieval using robust natural language processing 



Page 3 of 4 



21 Sparck Jones, K. and J. I. Tait. 1984. "Automatic search term variant generation." Journal of 
Documentation, 40(1), pp. 50—66. 

22 Tomel< Str2all<owsl<i , Barabara Vautliey, Fast text processing for information retrieval, 
P roc eedings of the workshop on Speech and Natural Langua ge , p. 346-352 , February 19-22, 1991, 
Pacific Grove, California 

23 Strzalkowski, Tomek and Barbara Vauthey. 1991. "Natural Language Processing in Automated 
Information Retrieval." Proteus Project Memo #42, Courant Institute of Mathematical Science, New 
York University. 

24 Tomek Strzalkowski, TTP: a fast and robust parser for natural language . Proceedin g s of the 14th 
conference on Computational linguistics , Aug ust 23-28, 1992, Nantes , France 

25 Wilks, Yorick A., Dan Fass, Cheng-Ming Guo, James E. McDonald, Tony Plate, and Brian M. 
Slator. 1990. "Providing machine tractable dictionary tools." Machine Translation, 5, pp. 99—154. 



^ CITINGS 10 

Joe Zhou , Troy Tanner, Con s tr uct i o n a n d v i sualiz a tion o f k e y term h i e r arc hies , Proceedings of the 
f ifth c on ference on A pplied na tural lan g u ag e processing , p. 307-311, March 31-April 03 , 1997 , 
Washington, DC 

Tomek Strzalkowski, Natural language information retrieval: TIPSTER-2 final report, Proceedings of a 
workshop on held at Vienna, Vir g inia: May 6-8, 1996 , May 06-08, 1996, Vienna , Virginia 

Tomek Strzalko ws ki, Docume n t r epresentation in natural lan guage tex t re trieva l. Pr o ceedings of the 
workshop on Human Language Technology, March 08-11, 1994, Plainsboro, NJ 

Tomek Strzalkowski, TTP: a fast and robust parser for natural language, Proceedings of the 14th 
conference on Computational lin g uistics, Au g ust 23-28, 1992 , Nantes , France 

Tomek Strza lkows ki, Buildin g a lexical domain m a p from text corpora, Proceedings of the 15th 
conference on Comp u tationa l ling uistics, Au g ust 05-09, 1994 , K yoto, Ja pan 

Alan M, Buckeridge , Richard F. E. Sutcliffe, Disambiguating n oun compounds with latent semantic 
indexing, COLING-02 on COMPUTERM 2002: second international workshop on computational 
terminolo gy, p . 1-7, Au g ust 31 , 2002 

Suzanne Liebowitz Taylor , Deborah A. Dahl , Mark Lipshutz , Cari Weir , Lewis M. Norton , Rosly n 
Nilson , Marcia Linebarger, Integrated text and image understanding for document understanding, 
Proceedings of the workshop on Human Language Technology, March 08-11 , 1994, Plainsboro, NJ 

Christia n J ac q uemin, Rec ycling t erms into a part i a l parser . Pro ce eding s of t h e fourth conference on 
Applied natural language processing, October 13-15, 1994, Stuttgart, Germany 

Marcia C. Linebarger , Lewis M. Norton , Deborah A. Dahl, A portable approach to last resort parsing 
and interpretation, Proceedings of the workshop on Human Language Technolo gy , March 21-24, 
1993 , Princeton, New Jersey 

Chengxia ng Zha i , Fa st s t a tis tical p ars ing of noun phrases for document indexin g, P roc eedings of the 
fifth conference on Applied natural lan g uage processing, p. 312-319, March 31-April 03, 1997 , 
Washing ton , DC 

Collaborative Colleagues: 

Tomek Strzalkowski : Amit Ba gga Ping Peng Xinyan g Zhan g 



http://portaLacm.org/citation.cfin?coll=GUIDE&dl=GUIDE&id=98 1 98 1 



9/21/06 



Information retrieval using robust 



natural language processing 



Page 4 of 4 



Leonard Bole 
Nick Cerco ne 
Hilda Hardy 
Paul B. Kantor 
Fan g Lin 

Mihnea Marinescu 
Miroslav Martinovic 
Kwonq Bor Nq 
Mark Osborn 



Jose Perez-Carballo 
Nobuyuki S h imizu 
Rong Tang 
Liu Ting 

Barabara Vauthey 
Barbara Vauthey 
Jin Wang 
Bowden Wise 
G. Bowden Wise 



Barbara Vauthey : Tomek StrzalkowskI 

4" Peer to Peer - Readers of this Article have also read: 

• Data structures for quadtree approximation and compression Communications of the ACM 
28, 9 

Hanan Samet 

• A hierarchical single-key-lock access control using the Chinese remainder theorem Proceedings 
of the 1992 ACM/SIGAPP Symposium on Applied computing 

Kim S. Lee , Huizhu Lu , D. D. Fisher 

• The GemStone object database management system Communications of the ACM 34, 10 
Paul Butterworth , Allen Otis , Jacob Stein 

• Puttin g innovation to work: adoption strategies for multimedia communicatio n systems 
Communications of the ACM 34, 12 

Ellen Francik , Susan Ehrlich Rudman , Donna Cooper , Stephen Levine 

• An intelligent component database for behavioral synthesis Proceedings of the 27th 
ACM/IEEE conference on Design automation 

Gwo-Dong Chen , Daniel D. Gajski 



The ACM Portal is published by the Association for Computing Machinery. Copyright © 2006 ACM, Inc. 
Terms of Usage Privacy Policy Code of Ethics Contact Us 

Useful downloads: S Adobe Acrobat Q QuickTime 1^ Windows Media Pla yer ^ Real Play er 



http://portaLacm.org/citation.cfm?coll=GmDE&dl=GUIDE&id=98 1 98 1 



9/21/06 



Efficient and effective metasearch for a large number of text databases 



Page 1 of 6 



a PC^RTAL 



Subscribe (Full Service) Register (Limited Service, Free) Login 
Search: ® The ACM Digital Library O The Guide 



USPTO 



SEARCH 



Feedback Re port a problem Satisfaction 
survey 



Efficient and effective metasearch for a large number of text databases 

Full text ^Pdf(1.04MB) 

Source Conference on Information and Knowledge Management archive 

Proceedings of the eighth international conference on Information and knowledge 

management table of contents 

Kansas City, Missouri, United States ^ 

Pages: 217-224 

Year of Publication: 1999 

ISBN:1 -581 13-146-1 

Authors Clement Yu Dept. of EECS, University of Illinois at Chicago, Chicago. IL 

Weiyi Menq Dept. of computer Science, SUNY - Binghamton, Binghamton, NY 
Kinq- Lup Liu Dept. of EECS, University of Illinois at Chicago, Chicago, IL 
Wensheng Wu Dept. of EECS, university of Illinois at Chicago. Chicago, IL 
Naphtali Rishe school of Connputer Science, Florida International University, Miami, FL 

Sponsors SIGART: ACM Special Interest Group on Artificial Intelligence 
SIGIR : ACM Special Interest Group on Information Retrieval 
SIGMIS : ACM Special Interest Group on Management Information Systems 

Publisher ACM Press New York, ny, usa 



Additional Information: abstract references citings index terms collaborative colleagues peer to peer 
Tools and Actions: Find similar Articles Review this Article 

Save this Article to a Binder Display Formats: BibTex EndNote ACM Ref 

DDI Bookmark: use this link to bookmark this Article: http://doi.acm. orq/1 0. 1 1 45/31 9950.320005 

What is a DDI? 



^ ABSTRACT 

Metasearch engines can be used to facilitate ordinary users for retrieving information from multiple 
local sources (text databases). In a metasearch engine, the contents of each local database is 
represented by a representative. Each user query is evaluated against the set of representatives of all 
databases in order to determine the appropriate databases to search. When the number of databases 
is very large, say in the order of tens of thousands or more, then a traditional metasearch engine 
may become Inefficient as each query needs to be evaluated against too many database 
representatives. Furthermore, the storage requirement on the site containing the metasearch engine 
can be very large. In this paper, we propose to use a hierarchy of database representatives to 
improve the efficiency. We provide an algorithm to search the hierarchy. We show that the retrieval 
effectiveness of our algorithm is the same as that of evaluating the user query against all database 
representatives. We also show that our algorithm is efficient. In addition, we propose an alternative 
way of allocating representatives to sites so that the storage burden on the site containing the 
metasearch engine is much reduced. 
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